Selecting Text Spans for Document Summaries: Heuristics and Metrics
نویسندگان
چکیده
Human-quality text summarization systems are difficult to design, and even more difficult to evaluate, in part because documents can differ along several dimensions, such as length, writing style and lexical usage. Nevertheless, certain cues can often help suggest the selection of sentences for inclusion in a summary. This paper presents an analysis of newsarticle summaries generated by sentence extraction. Sentences are ranked for potential inclusion in the summary using a weighted combination of linguistic features – derived from an analysis of news-wire summaries. This paper evaluates the relative effectiveness of these features. In order to do so, we discuss the construction of a large corpus of extractionbased summaries, and characterize the underlying degree of difficulty of summarization at different compression levels on articles in this corpus. Results on our feature set are presented after normalization by this degree of difficulty.
منابع مشابه
Feasibility of Using
Feasibility of Using Citations as Document Summaries Jeff Hand Carl Drott, Ph.D. The purpose of this research is to establish whether it is feasible to use citations as document summaries. People are good at creating and selecting summaries and are generally the standard for evaluating computer generated summaries. Citations can be characterized as concept symbols or short summaries of the docu...
متن کاملThe Polish Summaries Corpus
This article presents the Polish Summaries Corpus, a new resource created to support the development and evaluation of the tools for automated single-document summarization of Polish. The Corpus contains a large number of manual summaries of news articles, with many independently created summaries for a single text. Such approach is supposed to overcome the annotator bias, which is often descri...
متن کاملEXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS
Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...
متن کاملLearning Summary Content Units with Topic Modeling
In the field of multi-document summarization, the Pyramid method has become an important approach for evaluating machine-generated summaries. The method is based on the manual annotation of text spans with the same meaning in a set of human model summaries. In this paper, we present an unsupervised, probabilistic topic modeling approach for automatically identifying such semantically similar te...
متن کاملToward a Gold Standard for Extractive Text Summarization
Extractive text summarization is the process of selecting relevant sentences from a collection of documents, perhaps only a single document, and arranging such sentences in a purposeful way to form a summary of this collection. The question arises just how good extractive summarization can ever be. Without generating language to express the gist of a text – its abstract – can we expect to make ...
متن کامل